Increasing Recall of Lengthening Detection via Semi-Automatic Classification

نویسندگان

  • Simon Betz
  • Jana Voße
  • Sina Zarrieß
  • Petra Wagner
چکیده

Lengthening is the ideal hesitation strategy for synthetic speech and dialogue systems: it is unobtrusive and hard to notice, because it occurs frequently in everyday speech before phrase boundaries, in accentuation, and in hesitation. Despite its elusiveness, it allows valuable extra time for computing or information highlighting in incremental spoken dialogue systems. The elusiveness of the matter, however, poses a challenge for extracting lengthening instances from corpus data: we suspect a recall problem, as human annotators might not be able to consistently label lengthening instances. We address this issue by filtering corpus data for instances of lengthening, using a simple classification method, based on a threshold for normalized phone duration. The output is then manually labeled for disfluency. This is compared to an existing, fully manual disfluency annotation, showing that recall is significantly higher with semiautomatic pre-classification. This shows that it is inevitable to use semi-automatic pre-selection to gather enough candidate data points for manual annotation and subsequent lengthening analyses. Also, it is desirable to further increase the performance of the automatic classification. We evaluate in detail human versus semi-automatic annotation and train another classifier on the resulting dataset to check the integrity of the disfluent non-disfluent distinction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integration of Visible Image and LIDAR Altimetric Data for Semi-Automatic Detection and Measuring the Boundari of Features

This paper presents a new method for detecting the features using LiDAR data and visible images. The proposed features detection algorithm has the lowest dependency on region and the type of sensor used for imaging, and about any input LiDAR and image data, including visible bands (red, green and blue) with high spatial resolution, identify features with acceptable accuracy. In the proposed app...

متن کامل

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...

متن کامل

Automatic road crack detection and classification using image processing techniques, machine learning and integrated models in urban areas: A novel image binarization technique

The quality of the road pavement has always been one of the major concerns for governments around the world. Cracks in the asphalt are one of the most common road tensions that generally threaten the safety of roads and highways. In recent years, automated inspection methods such as image and video processing have been considered due to the high cost and error of manual metho...

متن کامل

ساخت نیمه‌خودکار یک پیکره از نظرات غیرمستقیم در دامنه دارو و بکارگیری آن برای تعیین قطبیت نظرات

Opinion mining is a well-known problem in natural language processing that has attracted increasing attention in recent years. Existing approaches have been often focused on identifying direct opinions and ignored indirect ones. However, in some domains such as medical, indirect opinions occur frequently. Therefore, ignoring indirect opinions can lead to the loss of valuable information and not...

متن کامل

Automatic Interpretation of UltraCam Imagery by Combination of Support Vector Machine and Knowledge-based Systems

With the development of digital sensors, an increasing number of high-resolution images are available. Interpretation of these images is not possible manually, which necessitates seeking for practical, fast and automatic solutions to solve the environmental and location-based management problems. The land cover classification using high-resolution imagery is a difficult process because of the c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017